Fisher discriminant model based on LASSO logistic regression for computed tomography imaging diagnosis of pelvic rhabdomyosarcoma in children

Computed tomography (CT) has been widely used for the diagnosis of pelvic rhabdomyosarcoma (RMS) in children. However, it is difficult to differentiate pelvic RMS from other pelvic malignancies. This study aimed to analyze and select CT features by using least absolute shrinkage and selection operator (LASSO) logistic regression and established a Fisher discriminant analysis (FDA) model for the quantitative diagnosis of pediatric pelvic RMS. A total of 121 pediatric patients who were diagnosed with pelvic neoplasms were included in this study. The patients were assigned to an RMS group (n = 36) and a non-RMS group (n = 85) according to the pathological results. LASSO logistic regression was used to select characteristic features, and an FDA model was constructed for quantitative diagnosis. Leave-one-out cross-validation and receiver operating characteristic (ROC) curve analysis were used to evaluate the diagnostic ability of the FDA model. Six characteristic variables were selected by LASSO logistic regression, all of which were CT morphological features. Using these CT features, the following diagnostic models were established: (RMS group)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${G}_{1}=-14.283+6.613{x}_{1}+5.333{x}_{2}+5.753{x}_{3}+12.361{x}_{4}+8.095{x}_{5}-0.715{x}_{6}$$\end{document}G1=-14.283+6.613x1+5.333x2+5.753x3+12.361x4+8.095x5-0.715x6; (Non-RMS group)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${G}_{2}=-2.008+3.539{x}_{1}+1.080{x}_{2}+1.154{x}_{3}+2.307{x}_{4}+1.656{x}_{5}+1.380{x}_{6}$$\end{document}G2=-2.008+3.539x1+1.080x2+1.154x3+2.307x4+1.656x5+1.380x6, where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${x}_{1}$$\end{document}x1, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${x}_{2}$$\end{document}x2, … and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${x}_{6}$$\end{document}x6 are lower than normal muscle density (1 = yes; 0 = no), multinodular fusion (1 = yes; 0 = no), enhancement at surrounding blood vessels (1 = yes; 0 = no), heterogeneous progressive centripetal enhancement (1 = yes; 0 = no), ring enhancement (1 = yes; 0 = no), and hemorrhage (1 = yes; 0 = no), respectively. The calculated area under the ROC curve (AUC) of the model was 0.992 (0.982–1.000), with a sensitivity of 94.4%, a specificity of 96.5%, and an accuracy of 95.9%. The calculated sensitivity, specificity and accuracy values were consistent with those from cross-validation. An FDA model based on the CT morphological features of pelvic RMS was established and could provide an easy and efficient method for the diagnosis and differential diagnosis of pelvic RMS in children.

www.nature.com/scientificreports/ Rhabdomyosarcoma (RMS) is a soft tissue sarcoma that accounts for approximately 4.5% of pediatric cancer cases and is characterized by a high degree of malignancy, infiltration of adjacent tissues, and early lymph node and distant metastases 1 . The pelvic cavity is one of the most common sites, second only to the head and neck 2 . Accurate preoperative diagnosis is very important to provide appropriate planning for surgery and could lead to a better prognostic outcome. In recent years, medical imaging technology has made great progress and is playing an increasingly important role in the diagnosis of tumours. Of many imaging modalities, computed tomography (CT) has the advantages of noninvasiveness, high spatial and density resolution, and high scanning speed and has been widely applied for the assessment of tumours in the abdomen, pelvis and thorax. CT images allow radiologists not only to determine the extent of tumours but also to determine the absence or presence of bony destruction, calcification, haemorrhage and/or metastases 3 , However, in fact, during the course of routine radiology diagnosis, because RMS has the general radiological appearance of soft tissue tumors, it is difficult to distinguish it from other pelvic soft tissue malignancies. However, to our knowledge, there are currently few literature reports on the CT features of pelvic RMS [4][5][6][7][8][9] , and some reports have indicated that the CT findings of pelvic RMS lack specificity [10][11][12] . Therefore, it is imperative to investigate the CT features of pelvic RMS and establish an accurate diagnostic method for pelvic RMS in children.
Least absolute shrinkage and selection operator (LASSO), first proposed by Robert Tibshirani in 1996, is a regression analysis method to reduce the dimensionality of data. Serving as a regularized estimation, LASSO can shrink the coefficients of variables and force certain regression coefficients to 0 by constructing a penalty function, so it is often adopted for variable selection before establishing prediction and diagnostic models 13,14 . Fisher discriminant analysis (FDA) is a classic method for identifying linear functions of variables to distinguish samples of different groups 15 . It was proposed by Fisher in 1936 and serves as a dimensionality reduction method that finds a linear combination of features that maximizes the between-class differences and minimizes the withinclass variation 16 . FDA has been widely used in many fields, including medical research 17 . To date, a variety of studies have applied FDA to predict, diagnose or identify diseases [18][19][20] . A study by Ni et al. 18 applied FDA to predict clinicopathological subtypes of breast cancer based on the radiological features of diffusion-weighted magnetic resonance imaging (MRI) and suggested that FDA was a promising method for predicting clinicopathological subtypes of breast cancer. In a report by Zou et al. 19 , FDA was applied to classify autism spectrum disorders (ASDs) based on folate-related metabolic markers, and the results showed that the FDA model could effectively distinguish ASD patients from healthy controls. Hao et al. 20 used FDA to establish a discriminant formula to distinguish patients with gastric cancer and colorectal cancer from healthy controls and achieved good results.
In this study, we used LASSO logistic regression to evaluate and select valuable CT features. Based on these features, an FDA model for pelvic RMS diagnosis was established, and the diagnostic accuracy was validated. Our aim was to develop a simple and accurate diagnostic method for the quantitative diagnosis of pelvic RMS by means of mathematical statistics.
We present the following article in accordance with the Tripod Checklist.

Materials and methods
Clinical data and pathological results were obtained from patients' medical records. Imaging data were retrieved from our picture archiving and communication system. CT imaging and feature extraction. All pediatric patients in our study underwent pelvic plain and contrast-enhanced CT examinations on a 256-slice spiral CT system (Philips Brilliance iCT, Philips, Netherlands) with a low tube voltage of 80-100 kV, a low tube current of 100-200 mA (the current varied during the acquisition and according to the child's body weight), a rotation time of 0.4 s, a pitch of 0.925, a collimation of 128*0.625 mm, a slice thickness of 5.0 mm and a reconstruction layer thickness of 1 mm. To obtain arterial and venous phase contrast-enhanced images, dual-phase dynamic contrast-enhanced CT was performed at 30 and 65 s, respectively, after intravenous injection of the contrast agent (Omnipaque, 350 mg/mL, Amersham Healthcare, Shanghai, China). The contrast agent was administered at a dose of 2 mL/kg body weight, with a maximum of 80 mL. During the course of routine radiology diagnosis, for tumor lesions, we mainly need to pay attention to their morphology, density, margin, enhancement mode and metastasis. Therefore, our study mainly selected image features from these five aspects to explore the differences between the two groups (RMS group and nongroup). According to previous literature and diagnostic experience. The following CT features of the tumors were evaluated: (a) morphology (multinodular fusion/lobulated/round/orbicular); (b) density (lower or higher than normal muscle density/calcification/hemorrhage/necrosis); (c) margin (clear or unclear); (d) contrast enhancement modes (surrounding blood vessels/homogeneous progressive centripetal enhancement/ring enhancement/ grape cluster reinforcement); (e) metastasis (lymphatic metastasis/bone erosion). The evaluation criteria for the above relatively special CT features are as follows: (1) multinodular fusion: in the CT images, multiple nodules of different sizes were observed in the pelvic cavity. Some of the nodules were fused together and fused into a www.nature.com/scientificreports/ lobulated mass 21 ; (2) surrounding blood vessels: multiple strip-like and punctate vascular shadows can be seen in the mass on CT enhanced scan images 21 ; (3) homogeneous progressive centripetal enhancement: on dynamic contrast-enhanced CT images, the mass can be seen with peripheral annular inhomogeneous enhancement in the arterial phase, and gradually centripetal inhomogeneous enhancement in the venous and delayed phases 21 ;

Study subjects.
(4) grape cluster reinforcement: when the mass is in a hollow structure (vaginal or bladder, etc.), a mass like a grape cluster will appear on the CT-enhanced image 22 ; (5) lymph node metastasis: cervical lymph nodes I, II and inguinal lymph nodes short diameter ≥ 1.5 cm, other cervical lymph nodes short diameter ≥ 1 cm, or the degree of enhancement was significantly higher than muscle tissue 23 . Prospective evaluation of the CT images for each patient was independently performed by two abdominal radiologists with 10-20 years of experience. The abdominal radiologists were blinded to patient characteristics and histologic results and evaluated the morphology, density, margin, enhancement modes and metastasis of the tumors. In case of disagreement between the two radiologists, a consensus was reached with a third senior abdominal radiologist and two other abdominal radiologists.
Statistical analysis. The patients were divided into an RMS group and a non-RMS group according to the pathological results. For quantitative variables, continuous variables that followed a normal distribution are described as the mean and standard deviation (SD), and a parametric t test was used to determine the statistical significance between the two groups. Otherwise, the variables are described as medians and interquartile ranges (IQRs), and a nonparametric Mann-Whitney U test was used for comparisons between the two groups. Categorical variables are expressed as the number of patients and respective percentage, and the χ 2 test or Fisher's exact test was used to compare the rates.
LASSO logistic regression was used to select the optimal characteristic features for diagnosing RMS from the basic and CT morphological features of the patients. The penalty parameter λ was optimized, and the resulting nonzero coefficient variables in the model were selected as the diagnostic variables. Based on these findings, FDA was established as a quantitative diagnostic model of pediatric pelvic RMS. The diagnostic ability of this model was evaluated by the receiver operating characteristic (ROC) curve. Additionally, the cumulative diagnostic ability of the features was analyzed.
A two-tailed P < 0.05 was considered to be statistically significant. All statistical analyses were performed using R 3.6.1 software and SPSS 23.0 software (IBM, Armonk, New York, USA).
Establishing the model. FDA is a classical approach to identify a linear function of variables to distinguish samples from different groups as much as possible 13 . In our study, patients with RMS and without RMS were set as the two groups: G 1 (RMS group) and G 2 (non-RMS group). A total of 6 CT features ( x i ) were used as diagnostic variables to establish a linear discriminant function: By using the discriminant rule, the result of the examination was found to belong to G 1 or G 2 .
(1) Raw data matrix: Two matrices ( W 1 , W 2 ) were constructed for G 1 and, G 2 , with CT features as the column vectors and pediatric patients as the row vectors.
Data matrix of G 1 : Data matrix of G 2 : (2) The mean column distributions of matrices W 1 and W 2 are as follows: (3) The coefficients c i were calculated using the differential calculus method.
(4) The discriminant function is defined as: The discriminant values represented by G 1 and G 2 were calculated. www.nature.com/scientificreports/ (6) The value of the Fisher discrimination function at the centroids was obtained as follows: (7) The above calculation was conducted using SPSS software with the following discriminant rule: If G 1 >G 0 , the sample of G belongs to G 1 , which means that the sample belongs to the RMS group. Otherwise, the sample belongs to G 2 ; that is, the sample belongs to the non-RMS group.
(8) The resulting discriminant functions for classification were also calculated using SPSS.
The values of G 1 and G 2 were calculated by substituting the CT features into the function. By comparing the G 1 and G 2 values, the subjects were classified according to the following principle: if G 1 > G 2 , the subjects were classified into the RMS group; if G 1 < G 2 , the subjects were classified into the non-RMS group.
Notations: G i : population of disease, i = 1, 2; W i : data matrix of G i ; c i : coefficients of Fisher's discriminant function, i = 0, 1, 2, · · · , 6; G 0 : values of Fisher discrimination function at centroids; x ij : content of the j CT feature in the i patient.

Model validation.
Leave-one-out cross-validation, in which each respective case is classified using all cases other than that case for deriving the classification formula, was used to validate the accuracy of the model. In addition, ROC curves were used to validate the accuracy of the model, where an area under the ROC curve (AUC) between 0.5 and 0.7 represented a low diagnostic value, that between 0.7 and 0.9 represented a medium diagnostic value, and that more than 0.9 represented a high diagnostic value 24 .
Ethical statement. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013). The study was approved by the Institutional Ethics Committee of Children's Hospital Affiliated to Chongqing Medical University and individual consent for this retrospective analysis was waived.

Results
Basic and CT morphological features of the patients. RMS     www.nature.com/scientificreports/ etal enhancement, enhancement at surrounding blood vessels, multinodular fusion, lower than normal muscle density, hemorrhage and ring enhancement (Fig. 3). The sensitivity, specificity, AUC and 95% confidence interval of the 6 selected features are shown in Table 3 in order of importance. The sensitivity, specificity and AUC of the FDA score were significantly higher than those of each single CT characteristic feature. Next, the feature importance was further evaluated using cumulative fisher discriminant models following the previously determined order of importance (Fig. 4). After the fourth cumulated feature, there was no significant improvement in the diagnostic ability of the discriminant models. Namely, there was no significant difference in the resulting AUCs (all above 0.96) for the first 4 indicators, the first 5 indicators and all 6 indicators (all   www.nature.com/scientificreports/ P > 0.05). This result suggested that using fewer indicators can also accurately diagnose RMS, which is helpful to save resources.

Discussion
During routine radiological diagnosis, RMS is difficult to distinguish from other pelvic tumors except malignant teratoma, which involves characteristic calcification and fatty composition 10 . Previous studies have shown some CT features of RMS, such as tumoral necrosis, lower density than muscle, clear margin, and susceptibility to lymphatic node or/and bone metastasis [25][26][27][28][29][30][31][32][33][34][35][36][37] . However, the specificity and sensitivity of these CT features have not been systematically analyzed due to the limited number of study samples. In this study, we extracted the important CT features of pelvic RMS by using LASSO logistic regression and established a quantitative diagnostic model for pelvic RMS by FDA. This diagnostic model is easy to operate, has a high diagnostic accuracy and can be applied in the diagnosis of pelvic RMS and differentiation from other pelvic tumors in children.
Our study showed that a total of 8 CT features of RMS were significantly different from those of other pelvic malignancies. The differences in these CT features may be related to their pathological characteristics. Studies have shown that pelvic RMS in children is highly invasive and characterized by multicentric growth [35][36][37][38] . In the CT images, multiple nodules of different sizes were observed in the pelvic cavity, and some nodules fused together and formed a lobulated mass. Studies also found that some RMS tumor cells were distributed around the blood vessels using microscopes and that the blood vessels were gradually surrounded by tumor cells as they grew 39,40 . This pathological phenomenon is consistent with multiple vascular shadows shown in the contrast-enhanced CT images of pelvic RMS. Moreover, it has been shown that RMS contains abundant fibrous tissue and that the contrast agent gradually permeates into the tumor center over time, persisting in the tumor fibrous tissue for a long time 41,42 . This is possibly why RMS tends to have heterogeneous progressive centripetal enhancement. In addition, the rich mucus in RMS tumors could result in the CT feature of lower than normal muscle density, and rapid tumor growth with nutrient requirements exceeding the vessel supply could contribute to the CT feature of tumoral necrosis.
In this study, we selected 6 CT features with diagnostic value using LASSO logistic regression and established a quantitative diagnostic model of pelvic RMS in children through FDA.
The advantage of this model is that it is simple to operate and easy to implement in routine imaging diagnosis. It is actually a simple mathematical formula that is: (RMS g r o u p ) G 1 = −14.283 + 6.613x 1 + 5.333x 2 + 5.753x 3 + 12.361x 4 + 8.095x 5 − 0.715x 6 ; ( N o n -R M S group)G 2 = −2.008 + 3.539x 1 + 1.080x 2 + 1.154x 3 + 2.307x 4 + 1.656x 5 + 1.380x 6 , where x 1 , x 2 , … and x 6 are lower than normal muscle density (1 = yes; 0 = no), multinodular fusion (1 = yes; 0 = no), enhancement at surrounding blood vessels (1 = yes; 0 = no), heterogeneous progressive centripetal enhancement (1 = yes; 0 = no), ring enhancement (1 = yes; 0 = no), and hemorrhage (1 = yes; 0 = no), respectively. After inserting the CT feature values into the model and calculating G 1 and G 2 values. By comparing the G 1 and G 2 values, subjects with unknown Table 3. Sensitivity, specificity, AUC and 95% CI of the single characteristic features. Se sensitivity, Sp specificity, CI confidence interval, AUC area under the curve. www.nature.com/scientificreports/ classification were classified according to the following principles: if G 1 > G 2 , the subjects were classified into the RMS group; if G 1 < G 2 , the subjects were classified into the non-RMS group. Therefore clinicians or radiologists can insert this mathematical formula into an excel document to use it, or make a small software to use it. The ROC curve suggested that the model had a high diagnostic value. In addition, cross-validation showed that the model had a high diagnostic value. Our study also found that the AUC of the overall FDA model was higher than that of each CT characteristic feature. Cumulative FDA was carried out based on the importance of the features. There was no significant improvement in the discrimination performance of the FDA model when using the first 4 features, the first 5 features and all 6 features (in the order of importance: heterogeneous progressive centripetal enhancement, enhancement at surrounding blood vessels, multinodular fusion, lower than normal muscle density, hemorrhage and ring enhancement). Therefore, the number of diagnostic CT features can be reduced to 4, and RMS can still be accurately diagnosed, which is beneficial for saving resources. The establishment of a diagnostic model allows us to go from image diagnosis based on human experiences to quantitative imaging diagnosis, which makes the diagnosis of pelvic RMS simpler and more accurate. In the future, we aim to use the CT diagnostic model of RMS to develop artificial intelligence diagnostic software for clinical practice. There were some limitations in this study. First, this was a retrospective study, which may have inherent selection bias. Second, the sample size was small. RMS is not a common disease in children; therefore, in future research, we need to further expand the sample size to improve the accuracy of our model. Finally, our study summarized only CT features, and the diagnostic value of MRI for RMS needs to be further studied.

Conclusions
Our study showed that pelvic RMS in children has some specific CT features. Furthermore, LASSO logistic regression is a reliable method for selecting diagnostic features of RMS. The FDA model based on CT morphological features can accurately diagnose pelvic RMS in children, with promising cross-validation performance. This diagnostic model could provide an easy and efficient method for the diagnosis and differential diagnosis of pelvic RMS in children.

Data availability
All data generated or analysed during this study are included in this article and its supplementary information files. www.nature.com/scientificreports/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.